Goto

Collaborating Authors

 simple temporal regularization


Review for NeurIPS paper: STEER : Simple Temporal Regularization For Neural ODE

Neural Information Processing Systems

Additional Feedback: While I have raised severe objections, I still believe that the method itself may have strong merits. Please consider the following questions and suggestions: a) If the authors deem the theorem really necessary, it needs to be made clearer why the regular Picard-Lindelöf theorem does not apply. Maybe I misunderstood something on a very fundamental level. If not, simply consider removing the section. However, I don't think the method necessarily requires a stiffness discussion. Is there a dependence between suitable parameter ranges b and different solvers/model architectures?


STEER : Simple Temporal Regularization For Neural ODE

Neural Information Processing Systems

Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks.